2 results
Predicting hospital-onset Clostridium difficile using patient mobility data: A network approach
- Kristen Bush, Hugo Barbosa, Samir Farooq, Samuel J. Weisenthal, Melissa Trayhan, Robert J. White, Ekaterina I. Noyes, Gourab Ghoshal, Martin S. Zand
-
- Journal:
- Infection Control & Hospital Epidemiology / Volume 40 / Issue 12 / December 2019
- Published online by Cambridge University Press:
- 28 October 2019, pp. 1380-1386
- Print publication:
- December 2019
-
- Article
-
- You have access Access
- Open access
- HTML
- Export citation
-
Objective:
To examine the relationship between unit-wide Clostridium difficile infection (CDI) susceptibility and inpatient mobility and to create contagion centrality as a new predictive measure of CDI.
Design:Retrospective cohort study.
Methods:A mobility network was constructed using 2 years of patient electronic health record data for a 739-bed hospital (n = 72,636 admissions). Network centrality measures were calculated for each hospital unit (node) providing clinical context for each in terms of patient transfers between units (ie, edges). Daily unit-wide CDI susceptibility scores were calculated using logistic regression and were compared to network centrality measures to determine the relationship between unit CDI susceptibility and patient mobility.
Results:Closeness centrality was a statistically significant measure associated with unit susceptibility (P < .05), highlighting the importance of incoming patient mobility in CDI prevention at the unit level. Contagion centrality (CC) was calculated using inpatient transfer rates, unit-wide susceptibility of CDI, and current hospital CDI infections. The contagion centrality measure was statistically significant (P < .05) with our outcome of hospital-onset CDI cases, and it captured the additional opportunities for transmission associated with inpatient transfers. We have used this analysis to create easily interpretable clinical tools showing this relationship as well as the risk of hospital-onset CDI in real time, and these tools can be implemented in hospital EHR systems.
Conclusions:Quantifying and visualizing the combination of inpatient transfers, unit-wide risk, and current infections help identify hospital units at risk of developing a CDI outbreak and, thus, provide clinicians and infection prevention staff with advanced warning and specific location data to inform prevention efforts.
2416: A machine learning pipeline to predict acute kidney injury (AKI) in patients without AKI in their most recent hospitalization
- Samuel Weisenthal, Samuel J. Weisenthal, Caroline Quill, Jiebo Luo, Henry Kautz, Samir Farooq, Martin Zand
-
- Journal:
- Journal of Clinical and Translational Science / Volume 1 / Issue S1 / September 2017
- Published online by Cambridge University Press:
- 10 May 2018, pp. 17-18
-
- Article
-
- You have access Access
- Open access
- Export citation
-
OBJECTIVES/SPECIFIC AIMS: Our objective was to develop and evaluate a machine learning pipeline that uses electronic health record (EHR) data to predict acute kidney injury (AKI) during rehospitalization for patients who did not have an AKI episode in their most recent hospitalization. METHODS/STUDY POPULATION: The protocol under which this study falls was given exempt status by our institutional review board. The fully deidentified data set, containing all adult hospital admissions during a 2-year period, is a combination of administrative, laboratory, and pharmaceutical information. The administrative data set includes International Classification of Diseases, 9th Revision (ICD-9) diagnosis and procedure codes, Current Procedural Terminology, 4th Edition (CPT-4) procedure codes, diagnosis-related grouping (DRG) codes, locations visited in the hospital, discharge disposition, insurance, marital status, gender, age, ethnicity, and total length of stay. The laboratory data set includes bicarbonate, chloride, calcium, anion gap, phosphate, glomerular filtration rate, creatinine, urea nitrogen, albumin, total protein, liver function enzymes, and hemoglobin A1c. The pharmacy data set includes, for each medication, a description, pharmacologic class and subclass, and therapeutic class. Data preprocessing was performed using Python library Pandas (McKinney, 2011). Top-level binary representation (Singh, 2015) was used for diagnosis and procedure codes. Categorical variables were transformed via 1-hot encoding. Previous admissions were collapsed using rules informed by domain expertise (eg, the most recent age or sum of assigned diagnosis codes were retained as elements in the feature vector). We excluded any patient without at least 1 rehospitalization during the time window. We excluded any admission with or without AKI where AKI was also present in the most recent hospitalization. For comparison, we do not exclude such admissions for an identical experiment in which we considered any AKI event as a positive sample (regardless of AKI presence in the most recent hospitalization). We defined an AKI event as an assignment of any of the acute kidney failure (AKF) ICD-9 codes [584.5, AKF with lesion of tubular necrosis, 584.6, AKF with lesion of renal cortical necrosis, 584.7, AKF with lesion of renal medullary (papillary) necrosis, 584.8, AKF with other specified pathological lesion in kidney, or 584.9, AKF, unspecified]. Since diagnosis codes are believed to be specific but not sensitive for AKI (Waikar, 2006), we supplemented them using creatinine for patients who had laboratory values. Diagnosis was made according to the Kidney Disease: Improving Global Outcomes (KDIGO) Practice Guidelines (AKI defined as a 1.5-fold or greater increase in serum creatinine from baseline within 7 d or 0.3 mg/dL or greater increase in serum creatinine within 48 h). We report preliminary model discrimination via area under the receiver operating characteristic curve (AUC) using k-fold cross validation grouped by patient identifier (to ensure that admissions from the same patient would not appear in the training and validation set). It was confirmed that the prevalence of positive samples in the entire data set was maintained in each fold. Python library Sci-kit Learn (Pedregosa, 2011) was used for pipeline development, which consisted of imputation, scaling, and hyper-parameter tuning for penalized (l1 and l2 norm) logistic regression, random forest, and multilayer perceptron classifiers. All experiments were stored in IPython (Pérez, 2007) notebooks for easy viewing and result reproduction. RESULTS/ANTICIPATED RESULTS: There were 107,036 adult patients that accounted for 199,545 admissions during a 2-year window. Per admission, there were at most 54 ICD-9 diagnoses, 38 ICD-9 procedures, 314 CPT-4 procedures, and 25 hospital locations visited. The admissions were 55% female, the average age was 46±standard deviation 20, and average length of stay was 2.5±8.0 days. We excluded 2360 admissions that involved an AKI event that directly followed an admission with an AKI event and 4130 admissions that did not involve an AKI event but directly followed an admission with an AKI event. In total, there were 4561 (5.3%) positive samples (AKI during rehospitalization without AKI in the previous stay) generated by 3699 unique patients and 81,458 negative samples (non-AKI during rehospitalization without AKI in the previous stay) generated by 31,831 unique patients. When using any AKI event as a positive sample (regardless of whether or not AKI was in the most recent stay), the prevalence was 7.3% (6921 positive samples generated by 4395 unique patients and 85,588 negative samples generated by 33,287 unique patients). Best results were achieved with a code precision of 3 digits for which we had a total of 4556 features per patient. Fitted hyper-parameters corresponding to each classifier were logistic regression with l1 penalty C as 2×10−3; logistic regression with l2 penalty C as 1×10−6; random forest number of estimators as 100, maximum depth as 3, minimum samples per leaf as 50, minimum samples per split as 10, and entropy as the splitting criterion; and multilayer perceptron l2 regularization parameter α as 15, architecture as 1 hidden layer with 5 units, and learning rate as 0.001. Five-fold stratified cross validation on the development set yielded AUC for logistic regression with l1 penalty average 0.830±0.006, logistic regression with l2 penalty 0.796±0.007, random forest 0.828±0.007, and multilayer perceptron 0.841±0.005. In an identical experiment for which an AKI event was considered a positive sample regardless of AKI presence in the most recent stay, we had 4592 features per sample with the same code precision. Five-fold stratified cross validation on the development set with identical settings for the hyper-parameters yielded AUC for logistic regression with l1 penalty average 0.850±0.004, logistic regression with l2 penalty 0.819±0.006, random forest 0.853±0.004, and multilayer perceptron 0.853±0.006. DISCUSSION/SIGNIFICANCE OF IMPACT: Our objective was to investigate the feasibility of using machine learning methods on EHR data to provide a personalized risk assessment for “unexpected” AKI in rehospitalized patients. Preliminary model discrimination was good, suggesting that this approach is feasible. Such a model could aid clinicians to recognize AKI risk in unsuspicious patients. The authors recognize several limitations. Since our data set corresponds to a time-window sample, patients with high frequency of hospital utilization are likely overrepresented. Similarly, our data set contains records from only 1 hospital network. Although we supplement with laboratory-based diagnosis, using diagnosis codes as labels is problematic as numerous reports suggest low sensitivity of codes for AKI. Future work includes calibration analysis, incremental updating (“online learning”), and a representation learning-based (“deep learning”) extension of the model.